Recycling Terms into a Partial Parser

نویسنده

  • Christian Jacquemin
چکیده

Both full-text information retrieval and large scale parsing require text preprocessing to identify strong lexical associations in textual databases. In order to associate linguistic felicity with computational efficiency, we have conceived FASTR a unification-based parser supporting large textual and grammatical databases. The grammar is composed of term rules obtained by tagging and lemmatizing term lists with an online dict ionary. Through F A S T R , large terminological data can be recycled for text processing purposes. Great stress is placed on the handling of term variations through metarules which relate basic terms to their semantically close morphosyntactic variants. The quality of terminological extraction and the computational efficiency of FASTR are evaluated through a joint experiment with an industrial documentation center. The processing of two large technical corpora shows that the application is scalable to such industrial data and that accounting for term variants results in an increase of recall by 20%. Although automatic indexing is the most straightforward application of FASTR, it can be extended fruitfully to terminological acquisition and compound interpretation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation Results of Concept-based Translator with Partial Parsing

In this paper, we describe an evaluation of the output of the translator using conceptbased grammars. This translator translates the Korean sentence generated by a speech recognizer into an English sentence through a concept analysis approach. A partial parsing function added to the translator and obtained better improvement because the performance of the parser (whole parser) is low in the sta...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

The Effect of Task Repetition and Task Recycling on EFL Learners' Oral Performance

One of the major criticisms leveled at task-based language teaching (TBLT), despite its countless merits, is developing fluency at the cost of accuracy. The post-task stage affords a number of options to counteract this downside through task repetition and task recycling. These two options are considered to positively affect learners' oral performance in terms of fluency, accuracy, and complexi...

متن کامل

Partial Training for a Lexicalized-Grammar Parser

We propose a solution to the annotation bottleneck for statistical parsing, by exploiting the lexicalized nature of Combinatory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are derived from sequences of CCG lexical categories rather than full derivations. A simple method is used for extracting dependencies from lexical category sequences, ...

متن کامل

Albany: A Component-Based Partial Differential Equation Code Built on Trilinos

Discretization Application Linear Solve Load Balancing Input Parser PDE Terms, BCs, Responses Libraries

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994